[fix](function) fix tokenize function incorrect result when first argument is const by airborne12 · Pull Request #62699 · apache/doris

airborne12 · 2026-04-22T06:58:29Z

Proposed changes

Fix a bug in the tokenize function where unpack_if_const unwraps a ColumnConst to its inner data column (which has only 1 row), but _do_tokenize and _do_tokenize_none iterate based on the source column's row count. This causes only 1 output row to be produced instead of input_rows_count rows when the first argument is a constant.

For example, SELECT tokenize('hello world', 'parser=english') FROM table_with_many_rows would previously return only 1 row instead of the expected number of rows matching the table.

The fix wraps the result in ColumnConst when the source column was const, which is the standard pattern used throughout the Doris codebase for handling const columns in function execution.

Further comments

Related Jira: DORIS-25296

Checklist(Required)

Does it affect the results of the existing test cases (Yes/No): No
Does it need to update the document (Yes/No): No
Is there a risk of compatibility changes (Yes/No): No

Thearas · 2026-04-22T06:58:36Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

…ument is const After unpack_if_const, the inner data column of a ColumnConst has only 1 row. The _do_tokenize and _do_tokenize_none functions iterate based on the source column's row count, so when the first argument is a constant (e.g., SELECT tokenize('hello world', 'parser=english') FROM table), only 1 output row was produced instead of input_rows_count rows. Fix by wrapping the result in ColumnConst when the source column was const.

### What problem does this PR solve? Problem Summary: clang-format-16 reported a formatting deviation in `function_tokenize.cpp` where the `ColumnConst::create(...)` argument fit on the same line as `result`. Join the two arguments on a single line so the file passes `build-support/check-format.sh`. ### Release note None ### Check List (For Author) - Test: No need to test (formatting-only change) - Behavior changed: No - Does this need documentation: No

airborne12 · 2026-05-25T14:47:38Z

run buildall

hello-stephen · 2026-05-25T15:20:11Z

TPC-H: Total hot run time: 30960 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 05e014fc037f9f3c2adefa784ad01b0e8f0faa7f, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17619	3931	4020	3931
q2	q3	10799	1356	799	799
q4	4683	475	357	357
q5	7563	2272	2102	2102
q6	312	170	135	135
q7	970	789	635	635
q8	9384	1824	1632	1632
q9	6913	4972	4921	4921
q10	6442	2242	1889	1889
q11	434	271	238	238
q12	690	429	289	289
q13	18214	3398	2748	2748
q14	272	255	239	239
q15	q16	828	772	703	703
q17	1027	966	966	966
q18	6797	5904	5478	5478
q19	1260	1354	1059	1059
q20	537	403	259	259
q21	5612	2518	2280	2280
q22	426	349	300	300
Total cold run time: 100782 ms
Total hot run time: 30960 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4308	4255	4228	4228
q2	q3	4544	4971	4352	4352
q4	2107	2217	1411	1411
q5	4462	4291	4598	4291
q6	262	208	143	143
q7	2093	1819	1641	1641
q8	2537	2161	2182	2161
q9	8078	7920	8059	7920
q10	4887	4741	4475	4475
q11	590	435	416	416
q12	761	797	545	545
q13	3282	3735	3005	3005
q14	311	300	286	286
q15	q16	713	748	664	664
q17	1322	1357	1344	1344
q18	7933	7546	6919	6919
q19	1127	1124	1067	1067
q20	2227	2222	1955	1955
q21	5331	4638	4490	4490
q22	526	459	406	406
Total cold run time: 57401 ms
Total hot run time: 51719 ms

hello-stephen · 2026-05-25T15:31:11Z

TPC-DS: Total hot run time: 171976 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 05e014fc037f9f3c2adefa784ad01b0e8f0faa7f, data reload: false

query5	4299	658	513	513
query6	343	217	202	202
query7	4247	570	312	312
query8	347	231	215	215
query9	8840	4098	4094	4094
query10	451	343	304	304
query11	5767	2605	2221	2221
query12	184	133	129	129
query13	1312	607	466	466
query14	6172	5524	5198	5198
query14_1	4518	4518	4493	4493
query15	221	211	193	193
query16	1007	414	413	413
query17	1154	759	613	613
query18	2718	501	369	369
query19	223	207	170	170
query20	137	133	129	129
query21	216	142	118	118
query22	13650	13662	13400	13400
query23	17376	16502	16272	16272
query23_1	16474	16304	16302	16302
query24	7485	1794	1273	1273
query24_1	1322	1309	1294	1294
query25	593	509	436	436
query26	1188	323	183	183
query27	2687	580	350	350
query28	4415	2037	2006	2006
query29	1038	654	525	525
query30	303	242	200	200
query31	1130	1081	955	955
query32	104	78	76	76
query33	565	367	305	305
query34	1181	1112	684	684
query35	783	828	689	689
query36	1379	1430	1295	1295
query37	150	100	89	89
query38	3244	3140	3081	3081
query39	925	950	901	901
query39_1	869	872	862	862
query40	219	142	123	123
query41	66	62	67	62
query42	109	106	105	105
query43	329	337	300	300
query44	
query45	215	205	194	194
query46	1080	1193	696	696
query47	2359	2410	2223	2223
query48	410	419	290	290
query49	625	486	374	374
query50	958	351	260	260
query51	4415	4367	4305	4305
query52	101	101	92	92
query53	268	280	203	203
query54	323	270	249	249
query55	92	88	84	84
query56	298	314	299	299
query57	1465	1466	1346	1346
query58	313	273	277	273
query59	1679	1743	1578	1578
query60	320	320	307	307
query61	164	160	160	160
query62	706	645	555	555
query63	249	201	204	201
query64	2374	802	663	663
query65	
query66	1644	494	365	365
query67	30082	29765	29519	29519
query68	
query69	457	337	305	305
query70	1049	1012	993	993
query71	297	275	258	258
query72	3004	2686	2416	2416
query73	853	744	450	450
query74	5132	4936	4787	4787
query75	2699	2595	2262	2262
query76	2307	1128	749	749
query77	423	406	335	335
query78	12308	12327	11850	11850
query79	2213	1083	785	785
query80	1523	553	449	449
query81	534	279	235	235
query82	454	159	116	116
query83	339	274	245	245
query84	258	140	107	107
query85	904	528	449	449
query86	452	350	326	326
query87	3446	3412	3248	3248
query88	3604	2711	2714	2711
query89	442	387	337	337
query90	1901	180	185	180
query91	177	162	141	141
query92	78	78	100	78
query93	1616	1410	804	804
query94	712	368	327	327
query95	673	403	345	345
query96	1057	803	350	350
query97	2726	2713	2616	2616
query98	232	226	224	224
query99	1202	1161	1038	1038
Total cold run time: 255891 ms
Total hot run time: 171976 ms

hello-stephen · 2026-05-25T15:54:51Z

BE UT Coverage Report

Increment line coverage 64.29% (9/14) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	53.83% (20886/38797)
Line Coverage	37.40% (197814/528947)
Region Coverage	33.70% (155005/459922)
Branch Coverage	34.70% (67487/194480)

hello-stephen · 2026-05-25T17:29:28Z

BE Regression && UT Coverage Report

Increment line coverage 64.29% (9/14) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.77% (28030/37997)
Line Coverage	57.65% (304163/527592)
Region Coverage	54.72% (254088/464345)
Branch Coverage	56.28% (109853/195206)

airborne12 · 2026-05-26T04:43:25Z

/review

github-actions

I found one blocking review issue: this BE function behavior fix does not add regression coverage, and the PR's motivating all-literal example should normally be handled by PreparedFunctionImpl's default constant-argument path before execute_impl reaches the new left_const branch. Please add a regression test that fails before this change and exercises the exact intended path, for example a multi-row query with a constant first argument and valid tokenizer properties, with deterministic ordered output or row-count validation through qt/order_qt output.

Critical checkpoint conclusions:

Goal: intended to fix tokenize row count for const first argument; the code may cover a lower-level const/non-const execution path, but the claimed SQL example is not proven by tests.
Scope: code change is small and localized.
Concurrency/lifecycle/config/compatibility: not involved.
Parallel paths: generic all-constant handling exists and should be considered; no FE/BE protocol changes.
Tests: insufficient; no regression or unit test was added for the behavior change.
Observability/transactions/data writes: not involved.
Performance/memory: no material new hot-path concern from the small wrapper change.

User focus points: none were provided.

github-actions · 2026-05-26T04:47:11Z

                _do_tokenize_none(*col_left, dest_column_ptr);
-                block.replace_by_position(result, std::move(dest_column_ptr));
+                if (left_const) {
+                    block.replace_by_position(


This behavior change needs regression coverage. The PR description uses SELECT tokenize('hello world', 'parser=english') FROM table_with_many_rows, but when both arguments are literals the generic PreparedFunctionImpl::default_implementation_for_constant_arguments should unwrap, execute one row, and wrap the result as a ColumnConst before this left_const branch is reached. Please add a regression test that fails before this patch and exercises the actual intended path, with multiple input rows and deterministic expected output/row count, so we can verify this branch fixes a real user-visible case rather than an untested lower-level edge case.

Adds regression coverage for the const-first-argument fix in FunctionTokenize::execute_impl. The generic PreparedFunctionImpl::default_implementation_for_constant_arguments path only fires when all arguments are constant, so the original bug example `SELECT tokenize('hello', '"parser"="english"') FROM t` never actually reaches execute_impl. The new tests use a parser_config column to force a non-const second argument, which is the path where the left_const branch matters. Both the analyzer path (parser=english) and the PARSER_NONE early return are covered. Assertions check that row count equals input_rows_count (was 1 before the fix) and that every row carries the expected tokenized value.

airborne12 · 2026-05-27T03:59:45Z

/review

airborne12 · 2026-05-27T04:00:09Z

run buildall

github-actions

I found one blocking issue in the added regression coverage. The BE fix itself is small and matches the local const-column handling pattern, but the new test uses a query shape that FE rejects before BE execution, so it does not provide valid coverage and should fail the regression suite.

Critical checkpoint conclusions:

Goal/test proof: the code aims to preserve row count when tokenize receives a const first argument after unpack_if_const; the implementation addresses the BE row-count issue, but the added regression does not prove it because the second argument is not legal SQL for tokenize.
Scope/focus: the BE code change is small and focused; the test addition is larger than necessary and currently invalid.
Concurrency/lifecycle/config/compatibility: no new concurrency, special lifecycle, config, storage format, or protocol compatibility concerns found.
Parallel paths: both analyzer and PARSER_NONE BE branches were updated consistently.
Error handling/memory/observability: no new unchecked Status, ownership, or observability issue found in the BE change.
Test coverage/results: coverage remains blocking because the new regression queries should fail analysis instead of exercising FunctionTokenize::execute_impl.
User focus: no additional user-provided review focus was specified.

hello-stephen · 2026-05-27T05:03:54Z

BE UT Coverage Report

Increment line coverage 64.29% (9/14) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	53.87% (20916/38828)
Line Coverage	37.44% (198144/529218)
Region Coverage	33.69% (155109/460338)
Branch Coverage	34.69% (67507/194623)

The earlier regression-test addition was invalid: FE `Tokenize.checkLegalityBeforeTypeCoercion()` requires the second argument to be a `StringLikeLiteral`, so the test query that read the parser config from a table column failed analysis instead of reaching `FunctionTokenize::execute_impl`. And when both arguments are literals `PreparedFunctionImpl::default_implementation_for_constant_arguments` already short-circuits with the all-const fast path, so there is no SQL shape that exercises the new `left_const` branch. Drop the invalid regression test and add a BE unit test that builds the block directly: arg0 is wrapped in a `ColumnConst(size=N)` whose inner data column has 1 row, arg1 is a regular `ColumnString` with N rows. With that shape `all_arguments_are_constant` is false, the generic fast path skips, `execute_impl` is reached, and `unpack_if_const(arg0)` returns `left_const == true`. The assertions check that the result column has N rows (was 1 before the fix) and that every row carries the same tokenized value, for both the analyzer branch and the `PARSER_NONE` early-return branch.

airborne12 · 2026-05-27T05:20:10Z

/review

hello-stephen · 2026-05-27T05:22:58Z

TPC-H: Total hot run time: 31259 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 029e1bc8d6a920f1c1ce86993a1e677a578767a8, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17702	4053	4039	4039
q2	q3	10755	1400	807	807
q4	4685	470	350	350
q5	7703	2273	2168	2168
q6	336	175	135	135
q7	973	780	632	632
q8	9483	1648	1482	1482
q9	6624	4997	4989	4989
q10	6455	2217	1874	1874
q11	451	277	244	244
q12	688	434	298	298
q13	18255	3342	2796	2796
q14	262	256	238	238
q15	q16	825	764	703	703
q17	992	961	956	956
q18	6846	5648	5535	5535
q19	1230	1199	1074	1074
q20	613	397	260	260
q21	5746	2515	2379	2379
q22	429	348	300	300
Total cold run time: 101053 ms
Total hot run time: 31259 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4425	4357	4297	4297
q2	q3	4501	4990	4354	4354
q4	2072	2209	1377	1377
q5	4425	4289	4709	4289
q6	256	194	142	142
q7	2067	1839	1697	1697
q8	2430	2162	2189	2162
q9	8018	7967	7877	7877
q10	4877	4770	4509	4509
q11	575	406	378	378
q12	759	752	532	532
q13	3254	3689	3034	3034
q14	292	306	290	290
q15	q16	703	723	670	670
q17	1338	1316	1329	1316
q18	7984	7222	6884	6884
q19	1120	1094	1098	1094
q20	2220	2210	1943	1943
q21	5231	4492	4425	4425
q22	517	471	431	431
Total cold run time: 57064 ms
Total hot run time: 51701 ms

github-actions

Automated review summary for head de5e1d7:

No additional distinct inline issues found in this pass. I read the existing review threads first and did not duplicate the previously raised concerns. The current diff only changes BE tokenize const-result handling and adds BE unit coverage; the previously changed Groovy regression file is no longer part of the current diff.

Critical checkpoint conclusions:

Goal/test: The patch preserves row count when tokenize receives a const first argument after unpack_if_const, and adds BE unit tests for both the analyzer path and parser='none' early-return path.
Scope: The implementation is small and focused on wrapping the single computed tokenized value as ColumnConst(input_rows_count) only when the first argument was const.
Concurrency/lifecycle: No new shared state, locks, threads, static initialization, or lifecycle-sensitive ownership was introduced.
Config/compatibility/protocol: No configuration, storage format, protocol, or FE/BE serialization compatibility changes were introduced.
Parallel paths: Both normal tokenizer and PARSER_NONE early-return paths now handle left_const consistently.
Conditional checks: The added left_const branches follow the existing constant-column semantics and are narrow.
Test coverage/results: New BE unit tests cover the direct BE path. I did not run tests in this review environment.
Observability: No new observability appears necessary for this local scalar-function behavior.
Transaction/persistence/data write correctness: Not applicable; this is a read-only scalar function path.
FE-BE variable passing: Not applicable; no new variables are transmitted.
Performance: The change avoids materializing repeated equal string results and uses ColumnConst, so there is no obvious new hot-path performance issue.

User focus: No additional user-provided review focus was specified, and no extra focus-specific issue was found.

airborne12 · 2026-05-27T05:28:07Z

run buildall

hello-stephen · 2026-05-27T05:33:58Z

TPC-DS: Total hot run time: 171922 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 029e1bc8d6a920f1c1ce86993a1e677a578767a8, data reload: false

query5	4311	655	515	515
query6	326	218	196	196
query7	4210	593	304	304
query8	328	237	221	221
query9	8786	4107	4053	4053
query10	448	347	298	298
query11	5802	2534	2238	2238
query12	189	130	125	125
query13	1310	641	440	440
query14	6201	5494	5184	5184
query14_1	4513	4481	4435	4435
query15	219	209	193	193
query16	990	445	439	439
query17	1255	747	617	617
query18	2482	491	364	364
query19	224	209	171	171
query20	140	131	134	131
query21	222	144	118	118
query22	13572	13561	13340	13340
query23	17369	16524	16145	16145
query23_1	16222	16375	16417	16375
query24	7559	1783	1338	1338
query24_1	1336	1315	1347	1315
query25	576	502	449	449
query26	1316	355	178	178
query27	2664	544	358	358
query28	4449	2033	2010	2010
query29	1015	626	522	522
query30	303	242	198	198
query31	1141	1088	960	960
query32	95	78	78	78
query33	550	361	298	298
query34	1201	1154	668	668
query35	778	808	737	737
query36	1392	1424	1262	1262
query37	155	106	90	90
query38	3207	3149	3050	3050
query39	933	932	925	925
query39_1	887	864	875	864
query40	241	139	120	120
query41	64	62	60	60
query42	109	108	108	108
query43	329	337	287	287
query44	
query45	219	209	200	200
query46	1134	1260	769	769
query47	2495	2474	2311	2311
query48	416	432	296	296
query49	638	500	379	379
query50	1048	366	260	260
query51	4297	4290	4279	4279
query52	107	103	93	93
query53	252	274	204	204
query54	311	270	253	253
query55	93	89	89	89
query56	298	299	318	299
query57	1436	1422	1349	1349
query58	293	272	271	271
query59	1598	1674	1440	1440
query60	312	319	342	319
query61	161	156	156	156
query62	692	646	582	582
query63	249	201	205	201
query64	2394	788	618	618
query65	
query66	1726	477	371	371
query67	29719	29753	29582	29582
query68	
query69	468	349	305	305
query70	1031	984	1008	984
query71	314	280	271	271
query72	2977	2713	2394	2394
query73	843	764	405	405
query74	5118	4971	4794	4794
query75	2695	2620	2263	2263
query76	2337	1141	759	759
query77	396	411	330	330
query78	12477	12355	11844	11844
query79	1487	1002	749	749
query80	1326	536	445	445
query81	510	279	236	236
query82	1324	154	125	125
query83	343	275	250	250
query84	258	145	112	112
query85	938	531	453	453
query86	440	353	312	312
query87	3475	3413	3212	3212
query88	3655	2750	2748	2748
query89	458	393	338	338
query90	1794	182	176	176
query91	179	174	139	139
query92	85	76	75	75
query93	1499	1516	880	880
query94	665	363	307	307
query95	677	472	348	348
query96	1040	822	341	341
query97	2739	2738	2634	2634
query98	228	231	225	225
query99	1207	1143	1038	1038
Total cold run time: 255380 ms
Total hot run time: 171922 ms

hello-stephen · 2026-05-27T05:49:06Z

TPC-H: Total hot run time: 31258 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit de5e1d7eb791a883092b1dedd870def1247e73c4, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17728	4060	4056	4056
q2	q3	10765	1376	798	798
q4	4685	476	345	345
q5	7575	2220	2102	2102
q6	235	181	138	138
q7	961	785	643	643
q8	9341	1770	1635	1635
q9	5142	4989	4931	4931
q10	6386	2183	1872	1872
q11	443	271	247	247
q12	628	427	296	296
q13	18080	3386	2743	2743
q14	266	257	235	235
q15	q16	819	784	711	711
q17	1054	999	896	896
q18	7117	5717	5614	5614
q19	1468	1291	1035	1035
q20	547	406	262	262
q21	5884	2613	2403	2403
q22	448	358	296	296
Total cold run time: 99572 ms
Total hot run time: 31258 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4386	4301	4257	4257
q2	q3	4521	4942	4326	4326
q4	2096	2356	1384	1384
q5	4418	4256	4314	4256
q6	231	174	127	127
q7	2481	1878	1699	1699
q8	2596	2241	2234	2234
q9	8080	7933	8037	7933
q10	4807	4839	4270	4270
q11	582	412	389	389
q12	824	755	541	541
q13	3279	3617	2936	2936
q14	303	291	275	275
q15	q16	721	724	626	626
q17	1352	1462	1320	1320
q18	7784	7283	7323	7283
q19	1118	1106	1081	1081
q20	2202	2222	1980	1980
q21	5253	4562	4513	4513
q22	522	449	431	431
Total cold run time: 57556 ms
Total hot run time: 51861 ms

hello-stephen · 2026-05-27T06:00:06Z

TPC-DS: Total hot run time: 172530 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit de5e1d7eb791a883092b1dedd870def1247e73c4, data reload: false

query5	4333	667	527	527
query6	331	215	202	202
query7	4248	573	307	307
query8	320	237	216	216
query9	8830	4122	4106	4106
query10	445	348	299	299
query11	5759	2574	2220	2220
query12	189	129	127	127
query13	1306	624	439	439
query14	6233	5481	5280	5280
query14_1	4481	4490	4422	4422
query15	212	201	188	188
query16	1006	482	444	444
query17	1114	725	579	579
query18	2713	493	353	353
query19	217	199	165	165
query20	143	134	130	130
query21	214	137	122	122
query22	13715	13665	13442	13442
query23	17375	16584	16026	16026
query23_1	16374	16250	16333	16250
query24	7510	1793	1333	1333
query24_1	1364	1344	1345	1344
query25	587	498	445	445
query26	1349	340	183	183
query27	2628	560	358	358
query28	4442	2037	2037	2037
query29	1000	663	524	524
query30	311	242	204	204
query31	1152	1089	960	960
query32	93	80	75	75
query33	569	360	300	300
query34	1167	1135	667	667
query35	784	799	718	718
query36	1382	1414	1252	1252
query37	156	110	96	96
query38	3250	3192	3069	3069
query39	945	910	912	910
query39_1	881	897	894	894
query40	238	155	129	129
query41	72	69	70	69
query42	126	113	114	113
query43	333	335	299	299
query44	
query45	217	212	198	198
query46	1076	1183	726	726
query47	2366	2373	2249	2249
query48	399	417	314	314
query49	655	513	404	404
query50	1056	354	256	256
query51	4394	4381	4329	4329
query52	108	110	97	97
query53	261	286	212	212
query54	346	309	269	269
query55	96	96	88	88
query56	317	325	317	317
query57	1446	1422	1327	1327
query58	307	287	286	286
query59	1606	1670	1451	1451
query60	336	361	321	321
query61	208	155	152	152
query62	689	658	594	594
query63	247	199	208	199
query64	2365	806	610	610
query65	
query66	1663	475	360	360
query67	29713	29716	29567	29567
query68	
query69	461	346	312	312
query70	1022	1006	996	996
query71	318	277	263	263
query72	2919	2714	2406	2406
query73	861	780	437	437
query74	5141	4948	4775	4775
query75	2827	2618	2255	2255
query76	2290	1194	802	802
query77	421	419	328	328
query78	12533	12355	11846	11846
query79	1407	1152	767	767
query80	647	538	469	469
query81	457	272	244	244
query82	1035	164	120	120
query83	371	288	251	251
query84	261	148	113	113
query85	898	556	493	493
query86	392	312	323	312
query87	3457	3393	3272	3272
query88	3725	2775	2761	2761
query89	440	394	349	349
query90	1958	193	187	187
query91	183	166	138	138
query92	83	74	74	74
query93	1441	1447	897	897
query94	536	338	303	303
query95	696	485	354	354
query96	1124	827	347	347
query97	2721	2739	2610	2610
query98	238	235	225	225
query99	1205	1145	1029	1029
Total cold run time: 254808 ms
Total hot run time: 172530 ms

ok

yiguolei · 2026-05-27T07:54:12Z

skip check_coverage

hello-stephen · 2026-05-27T08:04:55Z

BE Regression && UT Coverage Report

Increment line coverage 100.00% (14/14) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.85% (28084/38028)
Line Coverage	57.74% (304764/527862)
Region Coverage	54.73% (254361/464761)
Branch Coverage	56.38% (110145/195349)

…ument is const (#62699) ## Proposed changes Fix a bug in the `tokenize` function where `unpack_if_const` unwraps a `ColumnConst` to its inner data column (which has only 1 row), but `_do_tokenize` and `_do_tokenize_none` iterate based on the source column's row count. This causes only 1 output row to be produced instead of `input_rows_count` rows when the first argument is a constant. For example, `SELECT tokenize('hello world', 'parser=english') FROM table_with_many_rows` would previously return only 1 row instead of the expected number of rows matching the table. The fix wraps the result in `ColumnConst` when the source column was const, which is the standard pattern used throughout the Doris codebase for handling const columns in function execution. ## Further comments Related Jira: DORIS-25296 ## Checklist(Required) 1. Does it affect the results of the existing test cases (Yes/No): No 2. Does it need to update the document (Yes/No): No 3. Is there a risk of compatibility changes (Yes/No): No

airborne12 added 2 commits May 25, 2026 22:40

airborne12 force-pushed the fix-tokenize-const-row-count branch from 20601b5 to 05e014f Compare May 25, 2026 14:40

github-actions Bot requested changes May 26, 2026

View reviewed changes

github-actions Bot previously requested changes May 27, 2026

View reviewed changes

Comment thread regression-test/suites/inverted_index_p0/test_tokenize.groovy Outdated

github-actions Bot reviewed May 27, 2026

View reviewed changes

yiguolei added the dev/4.1.x label May 27, 2026

yiguolei approved these changes May 27, 2026

View reviewed changes

yiguolei merged commit 3884c1a into apache:master May 27, 2026
32 of 33 checks passed

github-actions Bot mentioned this pull request May 27, 2026

branch-4.1: [fix](function) fix tokenize function incorrect result when first argument is const #62699 #63735

Open

Conversation

airborne12 commented Apr 22, 2026

Proposed changes

Further comments

Checklist(Required)

Uh oh!

Thearas commented Apr 22, 2026

Uh oh!

airborne12 commented May 25, 2026

Uh oh!

hello-stephen commented May 25, 2026

Uh oh!

hello-stephen commented May 25, 2026

Uh oh!

hello-stephen commented May 25, 2026

BE UT Coverage Report

Uh oh!

hello-stephen commented May 25, 2026

BE Regression && UT Coverage Report

Uh oh!

airborne12 commented May 26, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

airborne12 commented May 27, 2026

Uh oh!

airborne12 commented May 27, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hello-stephen commented May 27, 2026

BE UT Coverage Report

Uh oh!

airborne12 commented May 27, 2026

Uh oh!

hello-stephen commented May 27, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

airborne12 commented May 27, 2026

Uh oh!

hello-stephen commented May 27, 2026

Uh oh!

hello-stephen commented May 27, 2026

Uh oh!

hello-stephen commented May 27, 2026

Uh oh!

yiguolei commented May 27, 2026

Uh oh!

hello-stephen commented May 27, 2026

BE Regression && UT Coverage Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants